Home » SQL & PL/SQL » SQL & PL/SQL » Fine tuning a delete statement
Fine tuning a delete statement [message #48363] Tue, 16 November 2004 12:59 Go to next message
Leonard Martin
Messages: 45
Registered: May 2002
Location: Canada
Member
Hello Oracle gurus
I have the following delete statement that would require to delete about
70,000 records. How could I fine tune this statement. If I use a cursor, how could
I frame the cursors to delete. The situation here is a NOT IN.

DELETE
      FROM A
      WHERE UNIQUE_ID NOT IN (SELECT unique_id FROM B WHERE filename='TEMP.DAT')

 

Thanks in advance
Re: Fine tuning a delete statement [message #48364 is a reply to message #48363] Tue, 16 November 2004 13:45 Go to previous messageGo to next message
Edward Stoever
Messages: 58
Registered: August 2002
Member
Avoid "NOT IN" whenever possible. It's not efficient.

Try turning your DELETE statement into a SELECT statement. Then you can compare different queries you come up with before actually deleting anything.

I don't know your data; it may work out to do something like this:
Select * FROM A
      WHERE UNIQUE_ID IN (SELECT unique_id FROM B WHERE filename<>'TEMP.DAT')


But, that would depend how tightly tied together the two tables are.

Edward
;-) -- www.database-expert.com
Re: Fine tuning a delete statement [message #48367 is a reply to message #48363] Tue, 16 November 2004 22:38 Go to previous messageGo to next message
kil
Messages: 10
Registered: April 2004
Junior Member
Could this work ?

delete from a
where exists(select 1
from b
where b.filename <> 'TEMP.DAT'
and b.unique_id = a.unique_id)
Re: Fine tuning a delete statement [message #48370 is a reply to message #48363] Wed, 17 November 2004 03:51 Go to previous messageGo to next message
shoblock
Messages: 325
Registered: April 2004
Senior Member
First - "NOT IN" is not inefficient. It is very efficient when used against small amounts of data (if the subquery returns 1 row, then NOT IN may be the best solution ever).

Rewrite your sql - if sql, anything can be written multiple ways. try them and determine which works best.

DELETE A
WHERE UNIQUE_ID NOT IN (SELECT unique_id FROM B WHERE filename='TEMP.DAT')

where not exists (
select null from b where b.unique_id = a.unique_id
and b.filename='TEMP.DAT')

where unique_id in (
select unique_id from a
minus
select unique_id FROM B WHERE filename='TEMP.DAT'
)
Re: Fine tuning a delete statement [message #48372 is a reply to message #48367] Wed, 17 November 2004 04:44 Go to previous messageGo to next message
William Robertson
Messages: 1643
Registered: August 2003
Location: London, UK
Senior Member
It will work, but it could give different results. Your original query looks for UNIQUE_ID values that do not occur in B for filename 'TEMP.DAT'. This one looks for UNIQUE_ID values that occur for any filename that is not 'TEMP.DAT'. This requires a row to exist in B, while the original query does not.
Re: Fine tuning a delete statement [message #48376 is a reply to message #48364] Wed, 17 November 2004 05:34 Go to previous messageGo to next message
William Robertson
Messages: 1643
Registered: August 2003
Location: London, UK
Senior Member
> Avoid "NOT IN" whenever possible. It's not efficient.

Really? Perhaps NOT EXISTS is faster?

SQL*Plus: Release 9.2.0.1.0 - Developer's Release on Wed Nov 17 14:20:18 2004

<b>BIG_TABLE is a few columns from DBA_OBJECTS, duplicated a few times:</b>

SQL> desc big_table
 Name                                Null?    Type
 ----------------------------------- -------- ------------------------
 OBJECT_ID                           NOT NULL NUMBER
 OBJECT_NAME                                  VARCHAR2(128)
 OBJECT_TYPE                                  VARCHAR2(18)

SQL> SELECT COUNT(*) FROM big_table;

  COUNT(*)
----------
    390000

1 row selected.

<b>SMALL_TABLE is a copy of USER_OBJECTS with a unique index on OBJECT_ID:</b>

SQL> desc small_table
 Name                                                        Null?    Type
 ----------------------------------------------------------- -------- -----------------------------------------
 OBJECT_NAME                                                          VARCHAR2(128)
 SUBOBJECT_NAME                                                       VARCHAR2(30)
 OBJECT_ID                                                   NOT NULL NUMBER
 DATA_OBJECT_ID                                                       NUMBER
 OBJECT_TYPE                                                          VARCHAR2(18)
 CREATED                                                              DATE
 LAST_DDL_TIME                                                        DATE
 TIMESTAMP                                                            VARCHAR2(19)
 STATUS                                                               VARCHAR2(7)
 TEMPORARY                                                            VARCHAR2(1)
 GENERATED                                                            VARCHAR2(1)
 SECONDARY                                                            VARCHAR2(1)

SQL> SELECT COUNT(*) FROM small_table;

  COUNT(*)
----------
       203

1 row selected.

<b>Check indexes (one on SMALL_TABLE, none on BIG_TABLE):</b>

SQL> @ind big_table
SQL>
SQL> @ind small_table
                                                                                 Distinct
Index                          Unique? Type              Status          Rows        keys Column name
------------------------------ ------- ----------------- -------- ----------- ----------- -------------------------
SMALL_PK                       Y       NORMAL            VALID            203         203 OBJECT_ID

<b>Both tables are analyzed:</b>

SQL> SELECT table_name, num_rows FROM user_tables WHERE table_name IN ('SMALL_TABLE','BIG_TABLE');

TABLE_NAME                       NUM_ROWS
------------------------------ ----------
BIG_TABLE                          390000
SMALL_TABLE                           203

2 rows selected.

<b>Test NOT EXISTS query:</b>

SQL> SELECT COUNT(*)
  2  FROM   big_table b
  3  WHERE  NOT EXISTS
  4         ( SELECT 1 FROM small_table s
  5           WHERE  s.object_id = b.object_id );

  COUNT(*)
----------
    380336

1 row selected.

Elapsed: 00:00:41.98

SQL> @xplan

--------------------------------------------------------------------
| Id  | Operation            |  Name       | Rows  | Bytes | Cost  
--------------------------------------------------------------------
|   0 | SELECT STATEMENT     |             |     1 |     4 |   404 
|   1 |  SORT AGGREGATE      |             |     1 |     4 |       
&#124*  2 |   FILTER             |             |       |       |       
|   3 |    TABLE ACCESS FULL | BIG_TABLE   | 19500 | 78000 |   403 
&#124*  4 |    INDEX UNIQUE SCAN | SMALL_PK    |     1 |     4 |       
--------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter( NOT EXISTS (SELECT /*+ */ 0 FROM "SMALL_TABLE" "S"
               WHERE "S"."OBJECT_ID"=:B1))
   4 - access("S"."OBJECT_ID"=:B1)

Note: cpu costing is off

<b>Test NOT IN query:</b>

SQL> SELECT COUNT(*)
  2  FROM   big_table b
  3  WHERE  b.object_id NOT IN
  4         ( SELECT s.object_id
  5*          FROM   small_table s )

  COUNT(*)
----------
    380336

1 row selected.

Elapsed: 00:00:15.91

SQL> @xplan

--------------------------------------------------------------------
| Id  | Operation            |  Name       | Rows  | Bytes | Cost  
--------------------------------------------------------------------
|   0 | SELECT STATEMENT     |             |     1 |     8 |   403 
|   1 |  SORT AGGREGATE      |             |     1 |     8 |       
|   2 |   NESTED LOOPS ANTI  |             |   376K|  2944K|   403 
|   3 |    TABLE ACCESS FULL | BIG_TABLE   |   390K|  1523K|   403 
&#124*  4 |    INDEX UNIQUE SCAN | SMALL_PK    |     7 |    28 |       
--------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("B"."OBJECT_ID"="S"."OBJECT_ID")

Note: cpu costing is off

<b>Repeat in case the difference was to do with block caching etc:</b>

SQL> SELECT COUNT(*)
  2  FROM   big_table b
  3  WHERE  NOT EXISTS
  4         ( SELECT 1 FROM small_table s
  5*          WHERE  s.object_id = b.object_id );

  COUNT(*)
----------
    380336

1 row selected.

Elapsed: 00:00:54.58

SQL> SELECT COUNT(*)
  2  FROM   big_table b
  3  WHERE  b.object_id NOT IN
  4*        ( SELECT s.object_id FROM small_table s );

  COUNT(*)
----------
    380336

1 row selected.

Elapsed: 00:00:15.56

SQL> 
Re: Fine tuning a delete statement [message #48396 is a reply to message #48372] Wed, 17 November 2004 22:38 Go to previous messageGo to next message
kil
Messages: 10
Registered: April 2004
Junior Member
So it should be :

delete from a
where not exists(select 1
from b
where b.filename = 'TEMP.DAT'
and b.unique_id = a.unique_id)

I think the not exists would be faster than the not in statement.
Re: Fine tuning a delete statement [message #48407 is a reply to message #48396] Thu, 18 November 2004 07:05 Go to previous message
William Robertson
Messages: 1643
Registered: August 2003
Location: London, UK
Senior Member
(Not sure what happened to my post above - looks fine in preview and on the test forum, so trying again...)

Yes, that should do it.

The problem with the earlier query arose if table B was (extreme example) empty. Leonard's original query,
DELETE FROM a
WHERE  unique_id NOT IN
       ( SELECT unique_id
         FROM   b
         WHERE  filename='TEMP.DAT' );

would delete all rows from A, but
DELETE FROM a
WHERE  EXISTS
       ( SELECT 1
         FROM   b
         WHERE  b.unique_id = a.unique_id 
         AND    b.filename <> 'TEMP.DAT' );

or for that matter,
DELETE FROM a
WHERE  unique_id IN
       ( SELECT unique_id
         FROM   b
         WHERE  filename <> 'TEMP.DAT' );

would delete nothing.
DELETE FROM a
WHERE  NOT EXISTS
       ( SELECT 1
         FROM   b
         WHERE  b.unique_id = a.unique_id
         AND    b.filename = 'TEMP.DAT' );

is a valid variation on the original query. It might be faster, or slower, or come to the same thing, depending on data volumes, distribution, optimizer settings etc. See www.orafaq.com/msgboard/plsql/messages/19327.htm.

Actually now I think about it, NOT IN and NOT EXISTS are not be quite the same thing if b.unique_id can be null:

SQL> CREATE TABLE a AS SELECT rownum AS unique_id FROM user_objects;

Table created.

SQL> CREATE TABLE b AS SELECT unique_id, 'TEMP.DAT' AS filename FROM a;

Table created.

SQL> update b set unique_id = null where rownum = 1;

1 row updated.

SQL> SELECT unique_id FROM a
  2  MINUS
  3  SELECT unique_id FROM b;

 UNIQUE_ID
----------
         1

1 row selected.

SQL> commit;

Commit complete.

SQL> DELETE FROM a
  1  WHERE  unique_id NOT IN
  2         ( SELECT unique_id
  3           FROM   b
  4           WHERE  filename='TEMP.DAT' );

<b>0 rows deleted.</b>

SQL> DELETE FROM a
  2  WHERE  NOT EXISTS
  3         ( SELECT 1
  4           FROM   b
  5           WHERE  b.unique_id = a.unique_id
  6*          AND    b.filename = 'TEMP.DAT' );

<b>1 row deleted.</b>

SQL> roll
Rollback complete.

SQL> DELETE FROM a
  2  WHERE  unique_id NOT IN
  3         ( SELECT NVL(unique_id,-1)
  4           FROM   b
  5*          WHERE  filename='TEMP.DAT' );

<b>1 row deleted.</b>

[Updated on: Fri, 18 February 2005 23:32]

Report message to a moderator

Previous Topic: D2KWutil and WebUtil info needed please
Next Topic: Moving a file inside a Procedure
Goto Forum:
  


Current Time: Sun Dec 22 05:53:55 CST 2024